Research in Computing Science, Vol. 65, pp. 35-50, 2013.
Abstract: Unsupervised dependency parsing is acquiring great relevance in the area of Natural Language Processing due to the increasing number of utterances that become available on the Internet. Most current works are based on Dependency Model with Valence (DMV) or Extended Valence Grammars (EVGs), in both cases the dependencies between words are modeled by using a fixed structure of automata. We present a framework for unsupervised induction of dependency structures based on CYK parsing that uses a simple rewriting techniques of the training material. Our model is implemented by means of a k-best CYK parser, an inductor for Probabilistic Bilexical Grammars (PBGs) and a simple technique that rewrites the treebank from k trees with their probabilities. An important contribution of our work is that the framework accepts any existing algorithm for automata induction making the automata structure fully modifiable. Our experiments showed that, it is the training size that influences parameterization in a predictable manner. Such flexibility produced good performance results in 8 different languages, in some cases comparable to the state-of-the-art ones.
Keywords: Unsupervised dependency parsing, bilexical grammars, soft-EM algorithm.
PDF: A Framework for Unsupervised Dependency Parsing using a Soft-EM Algorithm and Bilexical Grammars
PDF: A Framework for Unsupervised Dependency Parsing using a Soft-EM Algorithm and Bilexical Grammars